Stem Translation with Affix-Based Rule Selection for Agglutinative Languages
نویسندگان
چکیده
Current translation models are mainly designed for languages with limited morphology, which are not readily applicable to agglutinative languages as the difference in the way lexical forms are generated. In this paper, we propose a novel approach for translating agglutinative languages by treating stems and affixes differently. We employ stem as the atomic translation unit to alleviate data spareness. In addition, we associate each stemgranularity translation rule with a distribution of related affixes, and select desirable rules according to the similarity of their affix distributions with given spans to be translated. Experimental results show that our approach significantly improves the translation performance on tasks of translating from three Turkic languages to Chinese.
منابع مشابه
Coalescence Type based Confidence Warping for Agglutinative Language Keyword Spotting
In agglutinative languages like Korean, words are formed by joining l affix morphemes to the stem, which leads to high OOV rate in dictionary building. Hence, subword units are usually used as basic language modeling units in Large-Vocabulary Continuous Speech Recognition (LVCSR) or LVCSR based applications such as keyword spotting. In this work, firstly a new word property called coalescence t...
متن کاملAn Unsupervised Morpheme-Based HMM for Hebrew Morphological Disambiguation
Morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text. When the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew language, which combines morphemes into a word in both ag...
متن کاملMorphological Analyzer for Affix Stacking Languages: A Case Study of Marathi
In this paper we describe and evaluate a Finite State Machine (FSM) based Morphological Analyzer (MA) for Marathi, a highly inflectional language with agglutinative suffixes. Marathi belongs to the Indo-European family and is considerably influenced by Dravidian languages. Adroit handling of participial constructions and other derived forms (Krudantas and Taddhitas) in addition to inflected for...
متن کاملAn Affix Stripping Morphological Analyzer for Turkish
This paper presents the design and the implementation of a morphological analyzer for Turkish. A new methodology is proposed for doing the analysis of Turkish words with an affix stripping approach and without using any lexicon. The rule-based and agglutinative structure of the language allows Turkish to be modeled with finite state machines (FSMs). In contrast to the previous works, in this st...
متن کاملAn Attribute-Sample Database System for Describing Chuvash Affixes
In the paper is described “KÜLEPEK” – a database system created by the author for description of Chuvash word-changing and word-forming affixes. The system is based on the attribute-sample model of affixes, which allows to describe affixes and their phonological, morphotactic and orthography rules. The system uses DBase IV database engine and was created in Borland Delphi 7.0 software environme...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013